DerivBase.hr: A High-Coverage Derivational Morphology Resource for Croatian
نویسنده
چکیده
Knowledge about derivational morphology has been proven useful for a number of natural language processing (NLP) tasks. We describe the construction and evaluation of DERIVBASE.HR, a large-coverage morphological resource for Croatian. DERIVBASE.HR groups 100k lemmas from web corpus hrWaC into 56k clusters of derivationally related lemmas, so-called derivational families. We focus on suffixal derivation between and within nouns, verbs, and adjectives. We propose two approaches: an unsupervised approach and a knowledge-based approach based on a hand-crafted morphology model but without using any additional lexico-semantic resources. The resource acquisition procedure consists of three steps: corpus preprocessing, acquisition of an inflectional lexicon, and the induction of derivational families. We describe an evaluation methodology based on manually constructed derivational families from which we sample and annotate pairs of lemmas. We evaluate DERIVBASE.HR on the so-obtained sample, and show that the knowledge-based version attains good clustering quality of 81.2% precision, 76.5% recall, and 78.8% F1-score. As with similar resources for other languages, we expect DERIVBASE.HR to be useful for a number of NLP tasks.
منابع مشابه
CroDeriV: a new resource for processing Croatian morphology
The paper deals with the processing of Croatian morphology and presents CroDeriV – a newly developed language resource that contains data about morphological structure and derivational relatedness of verbs in Croatian. In its present shape, CroDeriV contains 14 192 Croatian verbs. Verbs in CroDeriV are analyzed for morphemes and segmented into lexical, derivational and inflectional morphemes. T...
متن کاملDerivational and Semantic Relations of Croatian Verbs
abstract Keywords: derivational morphology, morphosemantic relations, derivational relations, prefixation, semantic relations, Croatian WordNet This paper deals with certain morphosemantic relations between Croa-tian verbs and discusses their inclusion in Croatian WordNet. The mor-phosemantic relations in question are the semantic relations between unprefixed infinitives and their prefixed deri...
متن کاملDErivBase: Inducing and Evaluating a Derivational Morphology Resource for German
Derivational models are still an underresearched area in computational morphology. Even for German, a rather resourcerich language, there is a lack of largecoverage derivational knowledge. This paper describes a rule-based framework for inducing derivational families (i.e., clusters of lemmas in derivational relationships) and its application to create a highcoverage German resource, DERIVBASE,...
متن کاملMorphosemantic relations between verbs in Croatian WordNet
This paper deals with morphosemantic relations between Croatian verbs and discusses their inclusion in Croatian WordNet. Morphosemantic relations refer to semantic relations between morphologically related verbs, i.e., between verbs from the same derivational family. A derivational family consists of verbs with the same lexical morpheme grouped around a base form. Generally, a verb with the sim...
متن کاملThe Lemlat 3.0 Package for Morphological Analysis of Latin
This paper introduces the main components of the downloadable package of the 3.0 version of the morphological analyser for Latin Lemlat. The processes of word form analysis and treatment of spelling variation performed by the tool are detailed, as well as the different output formats and the connection of the results with a recently built resource for derivational morphology of Latin. A light e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014